arxiv preprint
Towards Federated Foundation Models: Scalable Dataset Pipelines for Group-Structured Learning Zachary Charles
We introduce Dataset Grouper, a library to create large-scale group-structured (e.g., federated) datasets, enabling federated learning simulation at the scale of foundation models. This library facilitates the creation of group-structured versions of existing datasets based on user-specified partitions, and directly leads to a variety of useful heterogeneous datasets that can be plugged into existing software frameworks. Dataset Grouper offers three key advantages. First, it scales to settings where even a single group's dataset is too large to fit in memory. Second, it provides flexibility, both in choosing the base (non-partitioned) dataset and in defining partitions.
- North America > United States > Virginia (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (4 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government (0.93)
Parallelizing Linear Transformers with the Delta Rule over Sequence Length Songlin Y ang Bailin Wang Y u Zhang Yikang Shen Y oon Kim Massachusetts Institute of Technology Soochow University
Transformers with linear attention (i.e., linear transfor mers) and state-space models have recently been suggested as a viable linear-time alt ernative to transformers with softmax attention. However, these models still underp erform transformers especially on tasks that require in-context retrieval. Whil e more expressive variants of linear transformers which replace the additive upda te in linear transformers with the delta rule [DeltaNet; 101 ] have been found to be more effective at associative recall, existing algorithms for training such mode ls do not parallelize over sequence length and are thus inefficient to train on modern ha rdware. This work describes a hardware-efficient algorithm for training line ar transformers with the delta rule, which exploits a memory-efficient representati on for computing products of Householder matrices [ 11 ]. This algorithm allows us to scale up DeltaNet to standard language modeling settings. We train a 1.3B mode l for 100B tokens and find that it outperforms recent linear-time baselines su ch as Mamba [ 31 ] and GLA [ 124 ] in terms of perplexity and zero-shot performance on downst ream tasks. We also experiment with two hybrid models which combine Delt aNet layers with (1) sliding-window attention layers every other layer or (2) two global attention layers, and find that these hybrids outperform strong transf ormer baselines.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Africa > Rwanda > Kigali > Kigali (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (19 more...)
- Education (0.67)
- Health & Medicine (0.48)
- Asia > Middle East > Jordan (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models
Experiments of pretraining 410M and 1B models on the C4 dataset demonstrate that MA TES significantly outperforms random data selection on extensive downstream tasks. It doubles the gains achieved by the state-of-the-art data selection approach that leverages larger reference models and reduces the total FLOPs required to reach certain performances by half. Further analyses validate the effectiveness of the locally probed oracle data influence and the approximation with data influence models. Our code is open-sourced at https://github.com/cxcscmu/MA
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Michigan (0.04)
- Europe (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Media > Film (0.46)
- Leisure & Entertainment (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
- Asia > Indonesia > Bali (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Pacific Ocean (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (3 more...)
- Information Technology (1.00)
- Banking & Finance > Trading (1.00)
- Health & Medicine > Therapeutic Area (0.92)
- North America > United States > Washington > King County > Seattle (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)